14 research outputs found
Demystifying GPT Self-Repair for Code Generation
Large Language Models (LLMs) have shown remarkable aptitude in code
generation but still struggle on challenging programming tasks. Self-repair --
in which the model debugs and fixes mistakes in its own code -- has recently
become a popular way to boost performance in these settings. However, only very
limited studies on how and when self-repair works effectively exist in the
literature, and one might wonder to what extent a model is really capable of
providing accurate feedback on why the code is wrong when that code was
generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's
ability to perform self-repair on APPS, a challenging dataset consisting of
diverse coding challenges. To do so, we first establish a new evaluation
strategy dubbed pass@t that measures the pass rate of the tasks against the
total number of tokens sampled from the model, enabling a fair comparison to
purely sampling-based approaches. With this evaluation strategy, we find that
the effectiveness of self-repair is only seen in GPT-4. We also observe that
self-repair is bottlenecked by the feedback stage; using GPT-4 to give feedback
on the programs generated by GPT-3.5 and using expert human programmers to give
feedback on the programs generated by GPT-4, we unlock significant performance
gains
CodeExp: Explanatory Code Document Generation
Developing models that can automatically generate detailed code explanation
can greatly benefit software maintenance and programming education. However,
existing code-to-text generation models often produce only high-level summaries
of code that do not capture implementation-level choices essential for these
scenarios. To fill in this gap, we propose the code explanation generation
task. We first conducted a human study to identify the criteria for
high-quality explanatory docstring for code. Based on that, we collected and
refined a large-scale code docstring corpus and formulated automatic evaluation
metrics that best match human assessments. Finally, we present a multi-stage
fine-tuning strategy and baseline models for the task. Our experiments show
that (1) our refined training dataset lets models achieve better performance in
the explanation generation tasks compared to larger unrefined data (15x
larger), and (2) fine-tuned models can generate well-structured long docstrings
comparable to human-written ones. We envision our training dataset,
human-evaluation protocol, recommended metrics, and fine-tuning strategy can
boost future code explanation research. The code and annotated data are
available at https://github.com/subercui/CodeExp.Comment: Accepted in Findings of EMNLP 202
Fault-Aware Neural Code Rankers
Large language models (LLMs) have demonstrated an impressive ability to
generate code for various programming tasks. In many instances, LLMs can
generate a correct program for a task when given numerous trials. Consequently,
a recent trend is to do large scale sampling of programs using a model and then
filtering/ranking the programs based on the program execution on a small number
of known unit tests to select one candidate solution. However, these approaches
assume that the unit tests are given and assume the ability to safely execute
the generated programs (which can do arbitrary dangerous operations such as
file manipulations). Both of the above assumptions are impractical in
real-world software development. In this paper, we propose CodeRanker, a neural
ranker that can predict the correctness of a sampled program without executing
it. Our CodeRanker is fault-aware i.e., it is trained to predict different
kinds of execution information such as predicting the exact compile/runtime
error type (e.g., an IndexError or a TypeError). We show that CodeRanker can
significantly increase the pass@1 accuracy of various code generation models
(including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.Comment: In the proceedings of Advances in Neural Information Processing
Systems, 202
The Three Pillars of Machine Programming
In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are:(i) intention,(ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software
Neurosymbolic Learning for Robust and Reliable Intelligent Systems
This thesis shows that looking at intelligent systems through the lens of neurosymbolic models has several benefits over traditional deep learning approaches. Neurosymbolic models contain symbolic programmatic constructs such as loops and conditionals and continuous neural components. The symbolic part makes the model interpretable, generalizable, and robust, while the neural part handles the complexity of the intelligent systems. Concretely, this thesis presents two classes of neurosymbolic models—state-machines and neurosymbolic transformers and evaluates them on two case studies—reinforcement-learning based autonomous systems and multirobot systems. These case studies showed that the learned neurosymbolic models are human-readable, can be extrapolated to unseen scenarios, and can handle robust objectives in the specification. To efficiently learn these neurosymbolic models, we introduce neurosymbolic learning algorithms that leverage the latest techniques from machine learning and program synthesis.Ph.D
Synthesis of domain specific Clause Normal Form encoders for bit-vector solvers
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 61-66).SMT solvers are at the heart of a number of software engineering tools. These SMT solvers use a SAT solver as the back-end and convert the high-level constraints given by the user down to low-level boolean formulas that can be efficiently mapped to CNF clauses and fed into a SAT solver. Current SMT solvers are designed to be general purpose solvers that are suited to a wide range of problems. However, SAT solvers are very non-deterministic and hence, it is difficult to optimize a general purpose solver across all different problems. In this thesis, we propose a system that can automatically generate parts of SMT solvers in a way that is tailored to particular problem domains. In particular, we target the translation from high-level constraints to CNF clauses which is one of the crucial parts of all SMT solvers. We achieve this goal by using a combination of program synthesis and machine learning techniques. We use a program synthesis tool called Sketch to generate optimal encoding rules for this translation and then use auto-tuning to only select the subset of these encodings that actually improve the performance for a particular class of problems. Using this technique, the thesis shows that we can improve upon the basic encoding strategy used by CVC4 (a state of the art SMT solver). We can automatically generate variants of the solver tailored to different domains of problems represented in the bit-vector benchmark suite from the SMT competition 2015.by Jeevana Priya Inala.M. Eng